Chapter 18

A Yes-or-No Proposition: Logistic

Regression

IN THIS CHAPTER

Figuring out when to use logistic regression

Getting a grip on the basics of logistic regression

Running a logistic regression model and making sense of the output

Watching for common issues with logistic regression

Estimating the sample size you need for logistic regression

You can use logistic regression to analyze the relationship between one or more predictor variables

(the X variables) and a categorical outcome variable (the Y variable). Typical categorical outcomes

include the following two-level variables (which are also called binary or dichotomous):

Lived or died by a certain date

Did or didn’t get diagnosed with Type II diabetes

Responded or didn’t respond to a treatment

Did or did not choose a particular health insurance plan

In this chapter, we explain logistic regression. We describe the circumstances under which to use it,

the important related concepts, how to execute it with software, and how to interpret the output. We

also point out the pitfalls with logistic regression and show you how to determine the sample sizes you

need to execute such a model.

Using Logistic Regression

Following are typical uses of logistic regression analysis:

To test whether one or more predictors and an outcome are statistically significantly associated.

For example, to test whether age and/or obesity status are associated with increased likelihood to

be diagnosed with Type II diabetes.

To overcome the limitations of the 2x2 cross-tab method (described in Chapter 12), which can

analyze only one predictor at a time (and the predictor has to be binary). With logistic regression,

you can analyze multiple predictor variables at a time. Each predictor can be a numeric variable or

a categorical variable having two or more levels.

To quantify the extent or magnitude of an association between a particular predictor and an

outcome that have been established to have an association. In other words, you are seeking to